An Improved Hadoop Data Load Balancing Algorithm

نویسندگان

  • Kun Liu
  • Gaochao Xu
  • Jun'e Yuan
چکیده

Data load balancing is one of the key problems of big data technology. As a big data application, Hadoop has had many successful applications. HDFS is Hadoop Distributed File System and has the load balancing procedure which can balance the storage load on each machine. However, this method cannot balance the overload rack preferentially, and so it is likely to cause the breakdown of overload machines. In this paper, we focus on the overload machines and propose an improved algorithm for balancing the overload racks preferentially. The improved method constructs Prior Balance List list which includes overload machines, For Balance List list and NextForBalanceList list by many factors and balances among the racks selected from these lists firstly. Experiments show that the improved method can balance the overload racks in time and reduce the possibility of breakdown of these racks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

A resource aware distributed LSI algorithm for scalable information retrieval

Latent Semantic Indexing (LSI) is one of the popular techniques in the information retrieval fields. Different from the traditional information retrieval techniques, LSI is not based on the keyword matching simply. It uses statistics and algebraic computations. Based on Singular Value Decomposition (SVD), the higher dimensional matrix is converted to a lower dimensional approximate matrix, of w...

متن کامل

Parallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment

Big data mining methods supports knowledge discovery on high scalable, high volume and high velocity data elements. The cloud computing environment provides computational and storage resources for the big data mining process. Hadoop is a widely used parallel and distributed computing platform for big data analysis and manages the homogeneous and heterogeneous computing models. The MapReduce fra...

متن کامل

Application of Simulated Annealing to Data Distribution for All-to-All Comparison Problems in Homogeneous Systems

Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provid...

متن کامل

Survey of Parallel Data Processing in Context with MapReduce

MapReduce is a parallel programming model and an associated implementation introduced by Google. In the programming model, a user specifies the computation by two functions, Map and Reduce. The underlying MapReduce library automatically parallelizes the computation, and handles complicated issues like data distribution, load balancing and fault tolerance. The original MapReduce implementation b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JNW

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013